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Abstract 

Many real-world datasets can be represented in the form of a graph whose edge 
weights designate similarities between instances. A discrete Gaussian random 
field (GRF) model is a finite-dimensional Gaussian process (GP) whose prior co- 
variance is the inverse of a graph Laplacian. Minimizing the trace of the predic- 
tion covariance S (V-optimality) on GRFs has proven successful in batch active 
learning classification problems with budget constraints. However, its worst-case 
bound has been missing. We show that the V-optimality on GRFs as a function 
of the batch query set is submodular and hence its greedy selection algorithm 
guarantees an (1 — 1/e) approximation ratio. Moreover, GRF models have the 
absence-of-suppressor (AofS) condition. For active survey problems, we propose 
a similar survey criterion which minimizes 1 T S1. In practice, V-optimality cri- 
terion performs better than GPs with mutual information gain criteria and allows 
nonuniform costs for different nodes. 

1 Introduction 

In many real-world applications, such as author classification based on coauthorship graphs, one 
or more output variables need to be predicted from a subset of queryable inputs, constrained by a 
budget. In batch active learning applications, an algorithm refines its prediction by generating a list 
of queries for domain experts to answer [2, 5, 8]. In both cases, we consider the situation where 
the similarities between all instances, both labeled and unlabeled, are known a-priori. We formulate 
these similarities by a graph G = (V, E) with edge weights W. The goal is to optimize the subset of 
nodes to query within the budget so that the risk in prediction can be minimized. One common risk 
is the predictive variance, measured by the trace of the covariance matrix of multivariate outputs. 
Minimizing this risk is known as the V-optimality criterion. 

Commonly used models for these subset selection or batch active learning problems are discrete 
Gaussian random fields (GRF) [2, 5], finite-dimensional Gaussian processes (GP) [1], and linear 
regression with prior knowledge of covariances [3, 4]. GRFs formulate the input-output correspon- 
dence by the conditional distribution of a (maybe improper) gaussian prior whose inverse covariance 
is set to be the graph Laplacian, sometimes with diagonal regularization. Finite-dimensional GPs 
define the prior as Af(0, W), where W is an arbitrary covariance matrix. Finally, linear regression 
with prior knowledge of covariance is essentially a finite-dimensional GP with linear covariance. 

GPs have been used as a base model for both subset selection and active learning [1]. One minor 
issue is that they require W to be positive-semidefinite. However a major issue is that they do not 
have a provable lower bound for optimality [4]. Instead, [1] used an alternative mutual information 
gain (MIG) criterion for selecting nodes for query. The MIG-criterion is naturally a normalized, 
monotone, and submodular function. As a result, a greedy algorithm gaurantees an (1 — 1/e) ap- 
proximation ratio. However, there is not classification-related risk function associated and the log 
determinates of covariance submatrices are sensitive to small eigenvalues, which can be a problem. 
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Another direction with GP models is to constrain the prior kernel matrix. [4] constrained the 
prior covariance matrix such that its diagonal entries are Is and off-diagonal entries some very 
small values. However, these models can be approximated by regularized GRF models in that 

(J + eW)- 1 = I - eW + e 2 W 2 « / - eW, when lims s W s = 0, with small e > 0. 

[4] also proposed an absence-of-suppressors (AofS) condition that is sufficient for submodularity. 
However, it is generally hard to verify if a discrete GP meets the AofS condition, whereas we will 
show that every GRF is AofS. 

Finally, [5] demonstrates semi-supervised and active learning using GRFs. Their motivation is that 
unlabeled nodes can reasonably influence the prediction by their edge weights with other nodes, 
because these weights can encode information such as sample density (e.g. using a radial basis 
function kernel to calculate the weight matrix). Later research [1] used spectral methods to boost 
the computation speed for subset selection in batch active learning. However, they only solved 
the subset selection case where every node query has a unit cost. Moreover, in both works, the 
optimization lacks worst-case guarantees. 

In this paper, we properly define a (regularized) discrete GRF model and prove an (1 — 1/e) approx- 
imation ratio lower bound with the V-optimality criterion under a limited budget for a greedy subset 
selection algorithm. We also extend this bound for the scenario where different nodes have different 
costs. GRF models are a special type of AofS GP models. Conversely, any GP model whose con- 
ditional covariance matrices are always nonnegative is a GRF model and is AofS. From real-world 
experiments we show that GRF models using the V-optimality criterion present advantages over GP 
models with the MIG criterion and random selection. 



2 Gaussian Random Fields and Subset Selection Problems 
2.1 The Gaussian Random Field (GRF) model 

Suppose the dataset can be represented in the form of a connected undirected graph G = (V, E) 
where each node has an (either known or unknown) label and each edge has a fixed nonnegative 
weight u>ij(= Wji) that reflects the proximity, similarity, etc between nodes Uj and Vj. Define 
the graph Laplacian of G to be Lq = diag (W) — W and the regularized graph Laplacian to be 
L a = L + diag (erf 2 , cr^ 2 ) with Oi > 0, Vi = 1, N. We use L to generalize both. 

The discrete Gaussian Random Field (GRF) is a joint continuous distribution on both labeled and 
unlabeled nodes, containing one tunable "heat" parameter f3 > 0, as 



/ p \ h^pl-^EijWijiyi-yj) 2 ) (unregularized) 
P(y)ocexpl --y Ly = i S 7 2 x (D 

V 1 / [exp{-1T,i,jWij(yi-yj) +Y, t ^yi) (regularized). 

Assuming labels yc = {yi 1 , } are tagged as tc € [0, a Gaussian Harmonic predictor 
predicts all unlabeled continuous nodes yu — {y Ul , Vu\ U \ } by factoring out known variables [5], 



\Vc = tc) - Af(fu,(3L u l ) = tf{f u ,PLrf_ c) ), (2) 

where L u is the submatrix consisting of the unlabeled row and column indices in L, for example the 
lower right block of L = ( ^ u j and fu = (—Ly 1 L u itc). By convention, L7y_ c , means 

\ Til WLL j 

the inverse of the submatrix. We use L( V -£) an d Lu interchangeably because C and U partition the 
set of all nodes V. 

In some problems, a test set T C U is specified. Define T to be a |T| x \U\ matrix such that tij = 
S(vti,v Uj ), i.e. Tyu = yr- Otherwise, a default value of T is the identity matrix of size \U\. By 
marginalizing out node variables in U\T from (2), we have ¥(y-j-\tc) ~ A/"(T 'fu, f3TL^_ c ^T T ). 

Notice that GRFs differ from general GPs in that the predictive mean f u e [0, l]^^! (Corollary 
1). Unlike GPs, GRFs do not "squeeze" regression responses to [0, 1] to get probability predictions. 
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2.2 Risk Minimization for Classification 



Since in GRFs regression responses are taken directly as probability predictions, it is computation- 
ally and analytically more convenient to apply the regression loss and risk directly in the GRF as in 
[2]. Assume the L2 loss to be our classification loss, L c (yj-, fj-) = £-r(2/tj — /ti) 2 , and arisk 
function whose input variable is the subset C as 

R C (C) = EVcyrL c (y T , f T \ Vc ) = EE \j2 t (y u + (TL^L^y^f y c ] = tr{TL u x T T ) (3) 



2.3 The Subset Selection Problem (the Active Learning for Classification Problem) 

Assume every vertex on the graph has a cost, (a unit cost if not specified), the major objective in this 
paper is to choose a subset of nodes C = {v^ , Vh c] } to query for labels, constrained by a given 
budget C, such that the risk is minimized. Formally, 

argmin £ R(C) = R C (C) = tr{TL^_ c) T T ) 

T ve c^<C (4) 

Though not explicitly denoted, the specific matrix T = (5(v t . , v Uj y^^L x depends on U = V — C. 



3 Submodularity, Suppressor-free, and Bounds for Greedy Method 

In § 3, we assume that L is nonsingular. This could be achieved by either deleting a node (a row and 
a column) from the original undirected connected graph Laplacian, i.e. assuming that the dataset 
always contains a fixed label, or by using the regularized L a . In these cases, L satisfies the following. 

• L has proper signs, i.e. kj > if i — j and kj < if i ^ j; (5) 

• L is undirected and connected, i.e. l i} ; = and > Vz; (6) 

• Node degree no less than number of edges, i.e. J2j hj — J2j Iji — ® Vi = 1, JV; (7) 

• L is nonsingular and therefore positive definite, i.e. 3i s.t. J2j hj = J2j hi > ®- (8) 
Conversely, our results hold if a finite-dimensional GP has covariance = and L satisfies (5-8). 



3.1 Major results 

• Submodularity. Under conditions (5-8), the risk reduction function Ra(£) '■= R($) — R(£) is 
normalized, monotone, and submodular, i.e., 

i?A(0) = (9) 

J? A (AU£ 2 ) > -Ra(A) (10) 

Ra(& U {v}) - R A (d) > R A (d U C 2 U {v}) - R A (d U C 2 ) (11) 

V Ci,C 2 ,v 

• Greedy Algorithm and near-optimal bounds. If (9-11) is satified, the optimization problem (4) 
is NP-hard and the greedy selection algorithm (Algo 1) produces a query set C g that gaurantee an 
(1 — 1/e) optimality bound [6], 

Ra(C 9 )>(1-^)-R a (£*), (12) 
where £ + is the global (NP) optimizer under the constraint J2 v ec, ° v — ^vec g c «- 

• Relationship with suppressor-free models. An absence-of suppressor (AofS) condition in regres- 
sion models gaurantees submodularity. With our notation, 1 this condition is \Corr(yi,yj\Ci U 
£2)1 < \Corr(yi,yj\Ci)\, Vuj, vj, C\, C 2 - An example of suppressor variable is some node 
v k e C 2 — C\ such that yi + yj = y k . Such variable is counter-tuitive in prediction models 
because knowing y k suppresses an unmodeled correlation between the predictors. We show that the 
GRF model is a perfect example for AofS condition. 

, \Corr(Z,Res(X i ,S)/Res(X ] ,S))\ < \p{Z, Res(Xi, S))\ in the original paper. 
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Algorithm 1: Greedy subset selection. Fast realization of * in [2], [5], and § 4. 

Input: Node costs c v , budget C, queryable pool V, objective function R{C). 
Output: A subset C C V by greedy selection. 
Define L ^ %, R old ^ R{%). 

while available pool V' = {v' e V — C : c v > + J2 v ec Cv — C} is not empty do 

R(C {/]: R° ld 
* Find il <— arc mm . 

b v'ev c v > 

_ Update L <- C U {<}, R old <- R(L). 



3.2 Proofs 

Lemma 1. For any L satisfying (5-8), the inverse of L is nonnegative, i.e. L^ 1 > (entry-wise). 2 

Proof. Define D = diag (L) and W = D - L, we have L = D - W = D(I - D^W). 
According to (5), entry-wise D > 0, W > and D~ 1 W > 0. Furthermore, by (7), 

0<D-*W=m) N <(^-) N , (13) 
\duJi,j=i \}_^ k w ik Ji,j=i 

WD^WU := sup ma ^K £) ~ 1 ^ ai )'l = max KiJ-^Jyl < max V < 1. (14) 

x ^ max, 1^1 i i ^22k w ik 

Thus, any eigenvalue A& and its corresponding eigenvector v k of D~ 1 W needs to satisfy 

|Afe|||ffe||co = ||AfeVfe||oo = \\D~ 1 Wv k \\ QO < H^felU, i-e. |A fe | < 1 Vfc = 1, ...,7V. 

Moreover, (8) the invertibility of L implies that (7 — D~ 1 W) is invertible, i.e. having no eigen- 
value. Hence, |Afc| < 1, Vfc = 1, N and ]im n - ¥00 (D~ 1 W) n = 0. The latter yeilds the following, 

L- 1 = (1 - D~ 1 W)~ 1 D~ 1 = [I + D^W + (D^Wf + ---}D~ 1 . (15) 

Since every term in the right hand side of (15) is nonnegative, L^ 1 should also be nonnegative. □ 

Corollary 1. GRF prediction functor L^ 1 L u \ maps y c € [0, l]' £ l to fu = —L^L u iyc € [0, 1]I W L 

Proof. Since L u > and — L ui > 0, we have y c > => (—L u {)yc > and y c > y' c =>■ 
L u x (-L u {)yc > L u x (-L u i)y' c . On the other hand, (L u ,L u i) • 1 > and L u x > imply 
• 1 > 0, i.e. 1 + L^L ul l > 0. Hence, 1 > -L u x L ul \ > -L^L ul y c . □ 

Lemma 2. Suppose L = (j^ 11 j^ 12 ^j satisfies (5-8), then L^ 1 — is positive- 

semidefinite and nonnegative. 

Proof. By block matrix inversion theorem, 

L- X -( L X n) = (~ Ln T Ll2 )(L22-L 21 L^L 12 )-i(-L 21 L n i i) (16) 



By assumption (8), L 1 is positive-definite, so is its lower right corner (L 22 — L 2 \L x l L\ 2 ) 1 . 
Thus, L~ x — (^q 1 is positive-semidefinite. 

ByLemmal,L _1 > and this implies that its lower right (L 22 — L 2 iL^ Li 2 ) _1 > 0. The subma- 
trix Lu also satisfies (5-8) and by Lemma 1, L^ 1 > 0. By sign rule (5), (—£12) = (-L 2 i) T > 0. 
Now that every term on the right side of (16) is nonnegative, the left side also has to be. □ 

Lemma 3 (Monotonicity). For function Ra(£) defined in §2.3, Ra(CiUC 2 ) > Ra(£i), VXi, L 2 . 



2 In the following, for any vector or matrix A, A > always stands for A being (entry-wise) nonnegative. 
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Proof. Direct application of Lemma 2. □ 

Lemma 4 (Submodularity). For function Ra{£) defined in § 2.3, R A (C\ U {v}) — Ra(£i) > 
R A (£i U C 2 U {v}) - R A (d U C 2 ), V£i,£ 2 ,«. 

Proo/ We may assume that C\,C 2 , and {«} are disjoint. Without loss of generality, suppose 

(17) 





i(V-£!U£ 2 U{t>}) 


i(V-£iU£ 2 U{<i}),£ 2 


£(v-£iU£ 2 u{<,}),M 


L (v-c 1 ) - j 


Ac2,(V-£iU£ 2 U{u}) 


-^£ 2 




v ^{n},(V-£iU£ 2 U{«}) 







:= I Y T Z 



b 



T „T 
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^_ I ==1-51-—) (18) 



and £(v-£iu£ 2 ) 



-k(V-£iU£ 2 uM) 


£(V-£iU£ 2 UM),M 


) = 


' A 


b 


-k{u},(V-£iU£ 2 uM) 








c 



(19) 



Apply block matrix inversion theorem, when a test set is not specified, i.e. T = I of size \U\, 
R A (C 1 U{v})-R A (C 1 )=R(C 1 )-R(C 1 U{v})=tr^ T ' " (V ^ 

Similarly, i* A (A U £ 2 U {«}) - i? A (A U £2) = ( ~ & ■ 

Notice that by sign rule (5), -b > > and by Lemma 2, A^ 1 > ^ Q * ^ > 0. 

Thus, (-b T )A- 1 (-b) > (-b T ,0) [}) = (-b T )A- 1 (-b) > OmiA-^-b) > 



°/ V , 

^0 0) ( 0^) = ^ _1 ( — ^ — ^' ^ ne P ro °f wnen a test set T is specified is fundamentally 
similar because the indicator matrix T is always applied to nonnegative vectors or matrices. □ 

Theorem 1 ((1-1/e) Bound). The function R A (C) defined in § 2.3 is normalized (by definition), 
monotone (by lemma 3), and submodular (by lemma 4). Therefore, ( 12) can be established. □ 

Definition 1. Since the conditional covariance of a GRF model is L^ 1 , we can properly define the 
corresponding conditional correlation to be 

Corr(y u \C) = (diag(L^_ c) )-^ L^-C) (diag(L^_ c) )-^ (21) 

Theorem 2 (AofS). Corr(yi,yj\Ci) > Corr(yi, yj\Ci U£2),V£i, £2, Vvi,vj ^£iU£ 2 . 



Proof. We may assume that C\ and C 2 are disjoint. Adopt the notations from (17-19). Now, 

1 / „\ /A- 1 bb T A- 1 -A-H \ 



c-b T A- 1 b c-b T A- 1 b 



(22) 



C d\ _ A b\ (A- 1 0^ 

d T e \b T c " O) ^ \ ~-b T ~A-*~ ' ' 1" 

/ V / V / \ c -b T A~ 1 b c-b T A- 1 b/ 

Divide vector d by diagonal number e yields 

As we have proved in Lemma 4, —b T A^ 1 > — b T A^ 1 > 0, i.e., 

( L (V-£ 1 )) i J ( L (V-£iU£ 2 ))y w h r w r roA\ 

> — — j — > vvi,Vj f. £1 U £2- (24) 



(■^(V-£i))jJ (^(V-£iU£ 2 )) 
)*J > (^(V-£iU£ 2 )) i J 
(■^(V-£i))" (^(V-£iU£ 2 ))" 



Similarly, [ ^ J > \ ^ £lU£2 )| J > 0. It suffices to multiply both sides of the above. □ 
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4 Extension to Active Survey and Tricks to Improve Efficiency 



In an active survey problem[7], our goal is to actively query points to ultimately predict the propor- 
tion of a given class. Embedded in GRF models, it changes the loss function to the Mean Squared 
Error (MSE) L s (y r , fr) = (E^erGfc - fu)? and the risk to R s {£) = 1 T TL^_ C) T T 1. 

For the subset selection problem with this new objective function, all results in § 3 hold and the 
proofs are similar. Besides, we developed an algorithm with 0(N 2 36 + kN 2 ) runtime complexity 
for k queries (0(kN 3 36 ) if implemented natively) similar to [2]. 

4.1 The Active Surveying Problem and The Proofs 

Similar to (4), define the subset selection (active surveying) problem as 

argmin £ R(£) = R s (£) = 1 T TL^_ C) T T 1 

s.t. J2 ve c c - < C - (25) 

We also assume (5-8) and Ra(£) '■= R($) — R{C). To prove Theorem 1 via Lemma 2 and 3, the 
only adjustment is with (20), 

A &V 1 [A- 1 " 



R A (C 1 U{v})-R A (£ 1 ) = R s (£ 1 )-R s (£ 1 U{v}) = l I T\^ bT J - J Tl 

((-r ') ( W '>) « - {-?"))' ■ <*> 

Still, because -&>-&> 0, A -1 > ^ Q 1 ^ > 0, and T > (f , 0) > 0, the above is larger than 
its counterpart in Ra{£i U £2 U {w}) — Ra{£i U £2)- □ 

4.2 Tricks to Improve Efficiency: With Precomputed Covariance 

In Algo 1, the most time-consuming step is to compute R(£ U {v 1 }) for every possible 1/ e T 5 , 
which in general involves taking the inverse of £(v-£u{y})- Zhu et. al. [5] presented a fast way to 
do this. Actually it can get even faster in the following way, assuming £(y-£u{t>'}) = S' = A -1 , 

^(V-£) = ^ = = (d^ e) ' ^* v ' t0 ^ 6 ^ C ^ aSt co ^ umn °^ ^ ' 

C d\ ( A~ x X Z^- 1 ^^- 1 -a- 1 " ' 



<2 T e " -^a- 1 1 

\ c-b T A- 1 b c~b T A~ 1 b 

A- 1 (A _ (C d\ _ 1 (d 
\d T e e 'U 



(27) 

(d T ,e) (28) 



• Yj* v 'Yj v > * . (29) 



0/ " E 



In Algo 2, only linear time is needed to evaluate the marginal gain of a candidate because 

R c (£ U {v'}) = tr(S') - tr(S) - tr(=^— • £.,,/£„/.) = const - : " ''' 



V'V' 

Tv ^2 



i? s (£ U {1/}) = l T S'l = 1 T E1 - 1 T ^^ • E.„»S„/.l = const - ^ 
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Algorithm 2: Fast progressive R(C U {v 1 }) evaluation with precomputed covariance. 



Input: Labeled set £, current R(C), X (covariance of y u conditioned on y c ), queryable pool V, 

and test set T C U if applicable (otherwise T <— 7 of size |7/|). 
Output: 7i(£u{v Pl }),...,7Z(£u{t; Pm }. 

T <- {Sivt^v^)) 1 ^ 1 ^ , or T <- 7 of size |W| if T not specified, 
for w Ps e 7> - £ do 

u' «- j if = w Pi . 

R(C U }) <- R(C) - g^yigg^O if c i ass i ncat i 0n or #(£) _ C^wj! tf suryey ^ 



4.3 Tricks to Improve Efficiency: Singular Laplacian 



However, we still have one question unsolved — how to compute the first L 1 when L for a con- 
nected graph is singular? 

The algorithm for classification problem argmin,^ tr(L~^_^ va ^) has been optimized in [2]. We can 
follow a similar method to compute argmin^ 1 T L~7 v _r^A. and also the criterion with specified 
test sets. Essentially, we want to avoid numerical inverse of large matrices as much as possible. In 
fact, both the algorithm in [2] and the following require only one eigen-decomposition of L , which 
has the same order of complexity as matrix inversion. 

Definition 2 (First Query in Survey Problem). Suppose L satisfies (5-7), i.e. every property in- 
cluding connectivity but singularity. Also suppose L has eigen-decomposition L = QAQ T , where 
A = diag (Ai, A2, \n) with Ai = 0, Xk > 0, Vfc 7^ 1 and Q is the orthogonal matrix whose every 
column is the regularized eigenvector corresponding to the eigenvalue in A. Denote the row vector 

( n \ 

'n 1 

representation 3 ofQasQ-' ~ 



and its miss-ith-row form Q- 



\ r N; 



The first query 



in survey problem asks to optimize 

argmini? s ({«,}) = l T 7 y i K} l = 1 T • (Q_„AQ T „)~ 1 • 1 



(30) 



Solution (First Query in Survey Problem). 

For any fixed i, denote (n-l)-by-n Q = Q-i t *. Thus QQ T = 7/v-i, Q T Q = In — r f r i, an d 











ft 


I) 




1) 



Rs({vi}) = 1 T (QAQ T )- 1 1. Also denote A 
nonsingular diagonal matrix and L = L(y_^ v .^ = QAQ 1 . By matrix inversion theorem, 

L- 1 = (-QQ T + Q(I N + A)Q T ) _1 

= (-QQ T ) _1 - {QQ T )~ 1 Q [(In + A)" 1 + Q t (-QQ t )- 1 q] ~ 1 Q T {QQ T r 1 

-1 _ 



— —In-i — Q 
= —In-i — Q 



(I N + A)- 1 -Q T Q 



Q 1 



(Vat + A)- 1 -I N ) + 



r 4 r t 



Q 1 



(31) 
(32) 

(33) 

(34) 

T 



Since 7 is a connected graph Laplacian, the normalized eigenvector for Ai = is ■ 
Therefore, we can denote r { = ^^7^1 a f), where is (N — l)-dimensional. Apply matrix inver- 



Notice this n representation is the only row vector representation in this paper. 
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sion theorem again, 



' N 
N 



i N 



M 




f N 



M 



mVN 



B- 1 ^ B- 1 + ±B~ 



N 



l -B~ 



where B' 1 = M _1 - M _1 a,- • 



i+a^M-^ " 1 and m = 7f ~ W a f & 



Assign a,i — ajM x a; L and we have 



(35) 



(36) 



ajB 1 a i = ouM 1 a i 



l + ajM-iai l + a t 1 + a, 

Finally, because the first column of the orthogonal Q is ^pl, we have 1 T Q = (y/N, T ) and 



R s ({vi}) = 1 T • L- 1 ■ 1 

= - (jv - l) - (i T Q - n ) 



(37) 
(38) 

(39) 



j_ 

m 



my/N 1 



B" 1 ^ B- 1 + l-B-i^B 

l 

i 

B^a, B 



1 „.T5-1 



-!\/iV 



m\fN~ 



m AT 



j J (i T Q - r,) T (40) 
(41) 



jV-1 
T 

—a. 



= -(JV-l)- (iV-l) 2 (l + a J ) + 2(7V-l)a l + T ^- + -^ 
= -^(A^-l)-Af 2 ai , 

where ai = (g i)2 , ft,jv) diag ^z^f) (&,2, -, <Zi,w) T . 

When a test set T is specified, since vi £ T, 



(42) 
(43) 



R s ({v i }) = l T TL- 1 T T l = -{\T\-l)-l T TQ (I N + A)" 1 + rf n Q T T T 1 (44) 



i _ 



= -(|T| - 1) - 1 T TQ [(I N + A)" 1 + rfn] " Q T T T 1. 
A similar algorithm can be derived, though the runtime complexity may has a factor |T|. 

Algorithm 3: Fast first-step R s ({v}) evaluation with singular Laplacian. 

Input: Singular connected graph Laplacian L. 
Output: R,({vi}) = 1 T L-J_ {vi}) l, i = 1, . . . , N. 

Perform eigen-decomposition L = QAQ T , where A = diag (Ai, An) in ascending order. 
Denote M" 1 <- diag (fJ, *±g, M^f) and Q = | • - | . 



(45) 

□ 



for i = l, AT do 

a t <- r l M- 1 rf. 

R s ({v l }) = -N(N-l)~N 2 a i . 
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5 Experiment 



We performed various active learning methods on the DBLP coauthorship graph dataset 4 on four 
areas: machine learning, data mining, information retrieval and database. Edge weights are the 
number of papers coauthored. We took its largest connected component, which contains 1711 nodes 
and 0.3% of all possible edges. We used the V-optimality criterion (§ 2), mutual information gain 
(max£ MX{C; V — £)) [1], and random selection. For fair comparison, every method was assigned 
the same random seed to start and the curves are the mean and the standard error of the mean after 
120 repetitions (Figure 1). The V-optimality criterion performs better than others. 




30 40 50 60 70 

number of queries 



90 



Figure 1: Batch active learning to classify the unlabeled authors on DBLP coauthorship graph. 



6 Conclusion 



In this paper, we introduced the GRF model (1) and the Gaussian harmonic prediction (2). The 
batch active learning with V-optimality criterion, whose risk function is (3) can be formulated as 
the subset selection problem (4). Our major contribution is to prove the submodularity conditions 
(9-11) and an (1 — 1/e) optimality bound (12) for a greedy selection algorithm (Algo 1) when the 
graph Laplacian is nonsingular (5-8), via either extracting a subgraph from the original connected 
graph or regularizing the GRF model. Furthermore, the fact that all GRFs meet the AofS condition 
(Theorem 2) may shed light on this otherwise obscure condition. 

In § 4, we also proposed an active survey problem and its related risk R S (C). We can show that this 
batch active survey problem also meet the submodularity conditions and its greedy subset selection 
algorithm achieves a similar (1 — 1/e) optimality bound. 
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